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Methods predicting protein secondary structure 
improved substantially in the 1990s through the use 
of evolutionary information taken from the diver- 
gence of proteins in the same structural family. Re- 
cently, the evolutionary information resulting from 
improved searches and larger databases has again 
boosted prediction accuracy by more than four per- 
centage points to its current height of around 76% 
of all residues predicted correctly in one of the 
three states, helix, strand, and other. The past year 
also brought successful new concepts to the field. 
These new methods may be particularly interesting 
in light of the improvements achieved through sim- 
ple combining of existing methods. Divergent evo- 
lutionary profiles contain enough information not 
only to substantially improve prediction accuracy, 
but also to correctly predict long stretches of iden- 
tical residues observed in alternative secondary 
structure states depending on nonlocal conditions. 
An example is a method automatically identifying 
structural switches and thus finding a remarkable 
connection between predicted secondary structure 
and aspects of function. Secondary structure pre- 
dictions are increasingly becoming the work horse 
for numerous methods aimed at predicting protein 
structure and function. Is the recent increase in 
accuracy significant enough to make predictions 
even more useful? Because the recent improvement 
yields a better prediction of segments, and in par- 
ticular of /3 strands, I believe the answer is affirma- 
tive. What is the limit of prediction accuracy? We 

shall See. © 2001 Academic lVess 



INTRODUCTION 

Histoiy. Linus Pauling correctly guessed the for- 
mation of helices and strands (14, 15) (and falsely 
hypothesized other structures). Three years before 
Pauling's guess was verified by the publications of 
the first X-ray structures (16, 17), one group had 
already ventured to predict secondary structure 
from sequence (18). The first-generation prediction 
methods following in the 1960s and 1970s were all 
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based on single amino acid propensities (19). The 
second-generation methods dominating the scene 
until the early 1990s used propensities for segments 
of 3-51 adjacent residues (19). Basically any imag- 
inable theoretical algorithm had been applied to the 
problem of predicting secondary structure from se- 
quence. However, it seemed that prediction accuracy 
stalled at levels slightly above 60% (percentage of 
residues predicted correctly in one of the three 
states: helix, strand, and other). The reason for this 
limit was the restriction to local information. Can 
we introduce some global information into local 
stretches of residues? 

Secondary structure prediction profits from diver- 
gence. Early on, Dickerson et al. (20) realized that 
information contained in multiple alignments can 
improve predictions. Zvelebil et al. (21) incorporated 
this concept into an automatic prediction method. 
However, the breakthrough of the third-generation 
methods to levels above 70% accuracy required a 
combination of larger databases with more ad- 
vanced algorithms (19, 22). The major component of 
these new methods was the use of evolutionary in- 
formation. All naturally evolved proteins with more 
than 35% pairwise identical residues over more than 
100 aligned residues have similar structures (23). 
This seemingly implies an amazing stability of 
structure with respect to sequence divergence. How- 
ever, this average figure hides the fact that neutral 
mutations are extremely unlikely. Supposedly most 
mutations result in proteins that will not adopt any 
globular structure, at all. In other words, only a tiny 
fraction of all possible proteins exist. Hence, posi- 
tion-specific profiles describing which residues can 
be exchanged against which others at which posi- 
tions contain crucial information about protein 
structure. One consequence is that stretches of say 
17 adjacent residues implictly contain some infor- 
mation about long-range interactions and environ- 
ment since the profile reflects evolutionary con- 
straints. Using evolutionary divergence was the 
start key to the third-generation prediction meth- 




REVIEW: PROTEIN SECONDARY STRUCTURE PREDICTION CONTINUES TO RISE 205 




on by 
error 



FIG. 1. Profile-based searches extend evolutionary information. The cloud signifies a protein structural family for the query protein 
U, i.e., all proteins that have a similar 3D structure. A simple pairwise comparison of U with all other proteins covers the "safe zone" of 
sequence alignment (gray circle around U). This zone can be defined, e.g., by BLAST scores below 10" 10 or by more than 35% pairwise 
identical residues over long alignments. Assume that there are only five other proteins (small white circles) in the safe zone falling on one 
side of U. For example, PSI-BLAST starts the next iteration with the family-specific profile given by the proteins found in the safe zone. 
Searching the database again with this profile reaches safely into the twilight zone (zone reached marked by double-lined egg indicated 
in figure). However, no current method generally reaches all members of family U. Furthermore, in particular for PSI-BLAST the new 
region may fall outside of the initial safe zone (black subregion of the safe zone). Finally, the regions that could have been reached by 
sequence-space hopping or intermediate sequence searches (dashed circles around five initial hits; (120, 121)) are not entirely covered by 
the profile-based search. The tricky bit is to avoid the possibility that the profile will pick unrelated proteins (transparent egg) and thus 
connect two separate structural families (U and X). Conclusions: (i) Iterated PSI-BLAST searches can safely identify fairly divergent 
family members, (ii) Close homologues may be lost during the extension of the family, (iii) The advanced search can lead the results astray. 



ods. Knowing 3D structure, 1 we can identify very 
distant relationships between proteins that would 
improve accuracy even further (24). Can we build 
larger and more diverged families without knowing 
structure? 

1 Abbreviations used: 3D structure, three-dimensional (coordi- 
nates of protein structure); ID structure, one-dimensional (e.g., 
sequence or string of secondary structure); ASP, method identi- 
fying regions of structure ambivalent in response to global 
changes (1); DSSP, database and method converting 3D coordi- 
nates into secondary structure (2); HMMSTR, hidden Markov 
model-based prediction of secondary structure (3); JPred, method 
combining other prediction methods (4, 5); JPred2, divergent 
profile (PSI-BLAST)-based neural network prediction (6); PHD, 
simple profile-based neural network prediction (7); PHDpsi, di- 
vergent profile (PSI-BLAST)-based neural network prediction (7, 
8); PROF, divergent profile-based neural network prediction 
trained and tested with PSI-BLAST (9); PSI-BLAST, gapped and 
iterative specific profile-based, fast and accurate alignment 
method (10); PSIPRED, divergent profile (PSI-BLAST)-based 
neural network prediction (11); SAM-T99sec, neural network pre- 
diction, using hidden Markov models as input (12); SSpro, profile- 
based advanced neural network prediction method (13). 



New database searches extend family divergence. 
It was also recognized very early on that information 
from the position-specific evolutionary exchange 
profile of a particular protein family facilitates dis- 
covering more distant members of that family (20). 
Automatic database search methods successfully 
used position-specific profiles for searching (25). 
However, the breakthrough for large-scale routine 
searches was achieved with the development of PSI- 
BLAST (10) and hidden Markov models (12, 26). In 
particular, the gapped, profile-based, and iterated 
search tool PSI-BLAST continues to revolutionize 
the field of protein sequence analysis through its 
unique combination of speed and accuracy. More 
distant relationships are found through iteration 
starting from the safe zone of comparisons and in- 
truding deeply and reliably into the twilight zone 
(Fig. 1). 

Topics left out here. This review focuses on meth- 
ods predicting secondary structure for globular pro- 
teins, in general. At the infancy of analyzing the 
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proteome of entirely sequenced organisms, the most 
useful structure prediction methods are those that 
focus on particular classes of proteins, such as pro- 
teins containing membrane helices and coiled-coil 
regions (27-30). For predicting the topology of heli- 
cal membrane proteins, a number of new methods 
add interesting new facets (31-36). However, no 
method has truly used the flood of recent experimen- 
tal information about membrane proteins (37). 
Overall, membrane helices can be predicted much 
more accurately than globular helices. The current 
state of the art is to correctly predict all membrane 
helix topology for more than 80% of the proteins and 
to falsely predict membrane helices for less than 4% 
of all globular proteins. We have recently come 
across evidence suggesting that this figure overesti- 
mates performance (Rost, unpublished). Clearly, 
methods developed to predict helices in globular pro- 
teins go completely wrong for membrane helices! In 
contrast, porins appear to be predicted relatively 
accurately by methods developed for globular pro- 
teins (38, 39). Few methods specifically predicting 
coiled-coil regions have been published recently (old- 
er review in (40)). Two interesting developments are 
the prediction of the dimeric state of coiled-coils (41) 
and a method predicting 3D structure for coiled-coil 
regions (42). In fact, the latter is the only existing 
method predicting 3D structure below 2-A main 
chain deviation over more than 30 residues. Another 
example of successful specialized secondary struc- 
ture prediction methods is the focus on /3 turns (43, 
44). The method from the Thornton group appears to 
be the most accurate current means of predicting 
turns. Successful methods specialized in predicting 
a-helix propensities have resulted from the experi- 
mental studies of short peptides in solution (45, 46). 
Neither the turn nor the helix-in-solution methods 
have yet been combined with other secondary struc- 
ture prediction methods. 

MORE DATA + REFINED SEARCH = BETTER 
PREDICTION 

Jones broke through by using PS I -BLAST 
searches of large databases. David Jones pioneered 
the use of iterated PSI-BLAST searches automati- 
cally (11). The most important step achieved by the 
resulting method PSIPRED has been the detailed 
strategy of avoiding pollution of the profile through 
unrelated proteins (Fig. 1). To avoid this trap, the 
database searched must be filtered first (11). At the 
CASP meeting at which David Jones introduced 
PSIPRED, Kevin Karplus and colleagues presented 
their prediction method (SAM-T99sec), finding more 
diverged profiles through hidden Markov models 
(47, 48). Recently, Cuff and Barton also successfully 
used PSI-BLAST alignments for JPred2 (see 49). 



Jennings et al. (50) explore an alternative to increas- 
ing divergence: they started with a safe zone align- 
ment through ClustalW (51) and HMMer (26) and 
iteratively refined the alignment using the second- 
ary structure prediction from DSC (52). The result- 
ing alignment is reported to be more accurate and to 
yield higher prediction accuracy than the initial 
ClustalW/HMMer alignments (50). How accurate is 
secondary structure prediction in 2000? 

F re diet ion accuracy peaks at 76% accui^acy. The 
current best methods reach a level of 76% three- 
state per-residue accuracy (Table I). This constitutes 
a sustained level more than four percentage points 
above the last century's best method not using di- 
verged profiles (PHD in Table I). Fortunately, the 
improvement is valid for helix, strand, and nonregu- 
lar regions (information and correlation indices in 
Table I). Furthermore, significantly fewer residues 
are confused between the states helix and strand 
(BAD score, Table I). Finally, some new methods 
also improve in a more global sense by improving 
the accuracy of assigning the secondary structural 
class (all-alpha, all-beta, alpha/beta, and other) 
based on the predicted content of regular secondary 
structure (Class score, Table I). 

Sources of improvement: Four parts database 
growth, three parts extended search, two paints other. 
Jones solicited two causes for the improved accu- 
racy: (i) training and (ii) testing the method on PSI- 
BLAST profiles. Cuff and Barton examined in detail 
how different alignment methods improve (6). How- 
ever, which fraction of the improvement results from 
the mere growth of the database, which fraction 
results from using more diverged profiles, and which 
fraction results from training on larger profiles? Us- 
ing PHD from 1994 to separate the effects (8), we 
first compared a noniterative standard BLAST (53) 
search against SWISS-PROT (54) with one against 
SWISS-PROT + TrEMBL (54) + PDB (55). The 
larger database improves performance by about two 
percentage points (8). Second, we compared the 
standard BLAST against the large database with an 
iterative PSI-BLAST search. This yielded less than 
two percentage points in additional improvement 
(8). Thus, overall, the more divergent profile search 
against today's databases supposedly improves any 
method using alignment information by almost four 
percentage points (PHDpsi in Table I). The improve- 
ment gained by using PSI-BLAST profiles to develop 
the method is relatively small: PHDpsi was trained 
on a small database of not very divergent profiles in 
1994; e.g., PROF was trained on PSI-BLAST profiles 
of a 20 times larger database in 2000. The two differ 
by only one percentage point (Table I), and part of 
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Accuracy of Secondary Structure Prediction Methods 0 
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a Data set and sorting: The results are compiled by EVA (58). All methods for which details are listed have been tested on 195 different 
new protein structures (EVA version February 2001). None of these proteins was similar to any protein used to develop the respective 
method. This set comprised the largest such set by February 1, 2001, for which we had results. Sorting and grouping reflect the following 
concept: if tbe data set is too small to distinguish between two methods, these two are grouped. For the given set of 195 proteins, this 
yielded three groups. Inside of each group, results are sorted alphabetically. Due £q a lack of data, I could not add the performance of 
SAM-T99sec (48); on a set of 105 proteins SAM-T99sec appears comparable to the best three methods: PSIPRED, SSpro, and PROF. The 
results from the Copenhagen method are set apart, since they were not collected continuously by EVA (the method is not publicly 
available); rather they were provided by the group in Denmark for this review and thus may have been based on marginally differing 
sequence databases. 

6 See abbreviations footnote in text; Copenhagen refers to the method from the group in Denmark (63); Wang/Yuan refers to a method 
predicting secondary structural class from the amino acid composition, which may be the most accurate such method (59). 

c Three-state per-residue accuracy, i.e., number of residues predicted correctly in one of the three states, helix, strand, or other 
(conversion of DSSP states (HG) — > helix, (EB) — > strand; note that the per-residue accuracy tends to favour methods overpredicting 
nonregular structure). 

d Three-state per-residue accuracy published in original publication of method: PSIPRED (11), SSpro (13), JPred2 (6), PHD (122). 

e Three-state per-segment score measuring the overlap between predicted and observed segments (75, 123). 

^Per-residue information content (22). 

8 Matthew's correlation coefficient for state helix (124). 

h Matthew's correlation for state strand (124). 

' Matthew's correlation coefficient for state other (124). 

* Percentage of proteins correctly sorted into one of the four classes: all-alpha (length > 60, helix >45%, strand <5%), all-beta (length > 
60, helix <5%, strand >45%), alpha/beta (length > 60, helix >30%, strand >20%), other (thresholds for classification from (122, 125, 126). 
1 Percentage of helical residues predicted as strand and of strand residues predicted as helix (127). 
m PSIPRED results were published for different conversions of the eight DSSP states to three states. 
" P. 

° The class accuracy for the method based on amino acid composition is taken from the original publication (59), i.e., based on a different 
data set than all other methods. 



this difference resulted from implementing new con- 
cepts into PROF (Rost, unpublished; 9). 

CAUTION: OVEROPTIMISM HAS BECOME EVEN 
MORE LIKELY! 2 

Seemingly improving accuracy by ignoring short 
segments. There are many ways to publish higher 
levels of accuracy. Among the simplest for secondary 
structure prediction is to convert 3 10 helices and f3 
bulges assigned by DSSP (2) to nonregular struc- 
ture. This yields higher levels of accuracy since all 
methods — on average — are better at predicting the 
middle of helices and strands than their caps and 
hence are more accurate for longer regular second- 
ary structure segments (56, 57). When predicted 
secondary structure is used to predict 3D structure, 

2 Note: I added this section listing "what not-to do" primarily 
for developers of methods, since many of the recently published 
methods fall prey to one of the problems mentioned. 



short helices are important. Thus, I suggest bearing 
with the more conservative conversion strategy. 

Comparing apples and oranges or too few apples 
with one another. To overstate the point: there is 
NO value in comparing methods evaluated on dif- 
ferent data sets. Most secondary structure predic- 
tion methods are available. Thus, developers may 
want to compare their results to public methods 
based on the same data set (not previously used for 
either of the two). Many methods predicting aspects 
of protein structure and function must fight with 
limited data availability. This is not at all the case 
for secondary structure prediction. Hundreds of new 
protein structures are added every year (55). If for 
some reason or another, small data sets must be 
used, developers should painstakingly try to esti- 
mate what "significant difference" means for their 
data set. For example, 16 new protein structures are 
clearly too few! We currently have results from 
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many prediction methods for 16 proteins. For that 
set, JPred2, PHD, PROF, PSIPRED, SAM-T99sec, 
and SSpro are indistinguishable (58)! 

Seemingly achieve 100% accuracy by using corre- 
lated sets. Many publications on predicting second- 
ary structural class from amino acid composition 
allowed correlations between "training" and testing 
sets. Consequently, levels of prediction accuracy 
published far exceeded the possible theoretical mar- 
gins (59). A very simple operational definition for 
"independent sets" is the following: Two proteins A 
and B are correlated if the sequence similarity be- 
tween A and B suffices to predict the structure of B 
knowing A's structure. Assume we have two uncor- 
rected sets of proteins Si and S2. Can we train the 
method on set SI and develop it on set S2 without 
further ado? While developing PROF, I realized that 
the answer is negative. In fact, I trained neural 
networks on about 2000 structures that had no sig- 
nificant level of sequence similarity to our original 
set of 126 proteins (22). I used the 126 proteins only 
after I had completed developing the method and 
found a prediction accuracy exceeding 80% (unpub- 
lished). When I tested PROF on a set of about 200 
new structures that had been added to PDB in the 
meantime (different from that given in Table I), 
prediction accuracy dropped. Do the 126 proteins 
differ from the set used for Table I? I failed to an- 
swer this question. Conclusion: test as test can; i.e., 
use as many independent sets of new structures as 
possible! 

EVA: Automatic evaluation of automatic predic- 
tion servers. In collaboration with Volker Eyrich 
(Columbia), Marc Marti-Renom and Andrej Sali 
(both from Rockefeller), and Florencio Pazos and 
Alfonso Valencia (both from CNB Madrid), we have 
started to address the above problems through the 
automatic server EVA (58). Leszek Rychlewski 
(IIMCB Warsaw) and Dani Fischer (Ben-Gurion 
University) are implementing similar ideas in Live- 
Bench (60). The simple concept is the following: 
Take the N newest experimental structures added to 
PDB, send the sequences to all prediction servers, 
collect the results, and accumulate a continuous 
evaluation of prediction accuracy every week. EVA 
has been evaluating secondary structure prediction 
methods for more than 6 months now. I found it 
instructive to see how the "ranking" of methods ini- 
tially changed from week to week due to too small 
sets. Currently, EVA also provides results for eval- 
uating comparative modeling (Sali group) and resi- 
due-residue contacts (Valencia group). We hope that 
EVA will eventually simplify life for developers, ref- 
erees, editors, and users. 
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CLEVER METHODS CAN BE MORE ACCURATE 

SSpro: Advanced recursive neural network system. 
The only method published recently that appears to 
improve prediction accuracy significantly not 
through more divergent profiles but through the 
particular algorithm is SSpro (13). The major idea of 
the method aims at solving the following problem. 
When, e.g., training neural networks it is important 
to avoid correlations between training samples pre- 
sented successively to the system. A neural network 
may be presented with the window around residue 
11 in protein X at time step T and residue 7 in 
protein Y at step T + 1. Thus, the system never 
learns that secondary structure correlates between 
adjacent residues. The result is that regular second- 
ary structure segments are predicted — on aver- 
age — at a length half that observed (19). PHD ad- 
dressed this problem by a second-level structure-to- 
structure network that was trained on the predicted 
secondary structure from the first-level sequence-to- 
structure network (22). Most authors have since im- 
plemented this idea (in particular PSIPRED and 
JPred2). Pierre Baldi and colleagues deviated sub- 
stantially from this concept. Instead of using an 
additional network, they embedded the correlation 
into one single recursive neural network. In princi- 
ple, the idea of a recursive network had been imple- 
mented before (6l). However, the particular details 
of the algorithm implemented in SSpro are novel 
and — as Table I illustrates — prove highly success- 
ful. 

HMMSTR: Hidden Markov models for connecting 
library of structure fragments. Can we predict sec- 
ondary structure for protein U by local sequence 
similarity to segments of known structures {S} even 
when overall U differs from any of the known struc- 
tures {S}? Yes, as shown by many nearest-neighbor- 
based prediction methods, the most successful of 
which seems to be NSSP (62). A conceptually quite 
different realization of the same concept has been 
implemented in HMMSTR by Chris Bystroff, David 
Baker, and colleagues (3). First, build a library of 
local stretches (3—19) of residues with "basic struc- 
tural motifs" (I sites). Second, assemble these local 
motifs through hidden Markov models introducing 
structural context on the level of supersecondary 
structure. Thus, the goal is to predict protein struc- 
ture through identification of "grammatical units of 
protein structure formation." Although HMMSTR 
intrinsically aims at predicting higher order aspects 
of 3D structure, a side result is the prediction of ID 
secondary structure. I find two results surprising, (i) 
The authors do not find any significant effect of 
"overoptimizing" their method; i.e., HMMSTR ap- 
pears as accurate in predicting secondary structure 
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for proteins known today as it will be for those 
known next year, (ii) Three-state per-residue accu- 
racy is reported to be about 74% (3). If this estimate 
is correct, HMMSTR is more accurate at predicting 
secondary structure than most existing methods and 
almost as accurate as the state-of-the-art methods 
(Table I). 

And the winner is? The reason for the particular 
focus of this review on a small number of methods is 
largely that I could compare the selected methods to 
one another based on new proteins. A particular 
method that was not available to me may turn out to 
mark the most substantial breakthrough in the 
field. A Danish group developed a neural network- 
based method that is most amazing in many re- 
spects (63). (i) The authors estimate the method to 
yield levels above 77% prediction accuracy (the title 
of their article is slightly misleading). If true, this is 
the best current method. Like PSIPRED, JPred2, 
and PROF, the method uses PSI-BLAST profiles as 
input and like most methods since PHD a two-level 
approach addressing the problem of predicting short 
segments, (ii) A concept that had not been published 
before is to replace the standard three output units 
(for helix, strand, and other), by nine output units 
additionally coding for the secondary structure 
states of the residues before and after the central 
one (dubbed "output expansion"), (iii) Also new is the 
particular way of weighting the average over differ- 
ent networks by the overall reliability of the predic- 
tion for that network and the mere number of dif- 
ferent networks considered (up to 800!). This 
impressive number of networks may prevent large- 
scale genome analyses based on this method. How- 
ever, the major point is: Did the authors overesti- 
mate performance? The authors tested their method 
in a way that most developers would assume to be 
error-proof. However, their testing protocol is very 
similar to the one that I applied when significantly 
overestimating the accuracy of PROF (>81%). Obvi- 
ously, the similarity of these two situations may 
very well be purely coincidental! 

Plethora of new concepts for secondary structure 
prediction. The following five methods are a small 
subset of new ideas explored to improve secondary 
structure prediction, (i) Ouali and King (64) combine 
neural networks and rule-based statistics in a cas- 
cade of classifiers. Based on a similar data set they 
estimate a level of prediction accuracy comparable 
to that of JPred2 (see Table I), (ii) Chandonia and 
Karplus (57) combined simplified output schemes 
(two output states) with networks trained on differ- 
ent tasks and a particular variant of early stopping; 
input is nondivergent alignments picked from the 
safe zone (Fig. 1). Based on a protocol similar to that 



applied by the Danish group (63), the authors esti- 
mate a level of >76% accuracy, i.e., a level that if it 
holds up is similar to SSpro (Table I), (iii) Suppos- 
edly the simplest new method that claims to almost 
approach the performance of PHD combines the in- 
formation for secondary structure formation con- 
tained in amino acid singlets, doublets, and triplets, 
(iv) Schmidler et aL (65) use a simple statistical 
model; the novel aspect is to replace compiling sta- 
tistics over fixed stretches of N residues by segments 
signifying regular secondary structure (helix, 
strand). The underlying formalism resembles a hid- 
den semi-Markov model allowing one to explicitly 
incorporate particular propensities such as helix 
caps (66). Based on noncomparable data sets the 
authors estimated prediction accuracy to be 69%; if 
correct, this is impressive for a method not using 
alignment information, (v) Without claims to sur- 
prising levels of accuracy, Figureau et aL (67) com- 
bine cleverly chosen pentapeptides from the data- 
base to obtain the final prediction. 

Secondary structural class predicted almost as ac- 
curately as by experiment. Grouping proteins into 
secondary structure classes (all-alpha, all-beta, al- 
pha/beta, and other) appears to be a useful initial 
approach for classifying proteins (27, 68). Surpris- 
ingly, such classes can be predicted successfully 
based merely on the overall amino acid composition 
of a protein (59, 69, 70). More and more increasingly 
complex and genial methods address this reduced 
goal; reported levels of prediction accuracy approach 
100%. Recently, Wang and Yuan explained these 
high values by insufficient testing schemes and chal- 
lenged that a four-state accuracy of 60% comprises 
the maximum for methods based solely on composi- 
tion (59). Obviously, it is much easier to predict class 
starting from the detailed information about evolu- 
tionary profiles for the entire sequence than by re- 
stricting the input to composition. In fact, the best 
current methods also improve the accuracy in pre- 
dicting secondary structure class considerably (Ta- 
ble I). The differences between observed and pre- 
dicted composition of secondary structure are now 
below 6% for helix and strand. This is fairly close to 
what experimental low-resolution (circular dichro- 
ism, Fourier transform-induced spectroscopy) meth- 
ods achieve at their best (57). 

COMBINING MEDIOCRE AND GOOD METHODS 
MAY BE BEST 

Combination improves on nonsystematic errors. 
Any prediction method has two sources of errors: (i) 
systematic errors, e.g., through nonlocal effects, and 
(ii) white noise errors caused by, e.g., the succession 
of the examples during training neural networks. 
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Theoretically, combining any number of methods 
improves accuracy as long as the errors of the indi- 
vidual methods are mutually independent and are 
not only systematic (71). PHD — and more recently 
other methods (6, 57, 63) — used this fact in combin- 
ing different neural networks. The idea of combining 
different prediction methods has been around in 
secondary structure prediction for a long time (19); 
Cuff and Barton (see 4, 5) implemented it in JPred 
for different third-generation methods. In particu- 
lar, JPred uses a simple expert rule for compiling 
the final average. King et aL (72) have tested a 
variety of different combination strategies. Selbig et 
aL (73) have compiled the jury through an elabo- 
rated decision-tree-based system. Guermeur et al. 
(74) have used a more refined variant of the JPred 
idea of weighting methods. Overall, combinations of 
independent prediction methods seem to yield levels 
of accuracy higher than that of the single best 
method. However, for every protein one method 
tends to be clearly superior to the combined predic- 
tion (Fig. 2B). Is it really wise to include signifi- 
cantly inferior methods into a combined prediction? 
No: averaging over all methods used for EVA de- 
creased accuracy over the best individual methods, 
although averaging over the better ones was better 
than averaging the best ones (Rost, unpublished 
results). Is there any criterion for when to include a 
method and when not to do so? Concepts weighting 
the individual methods based on their accuracy and 
"entropy" (63) appear successful only for large num- 
bers of methods (63; Rost, unpublished results). 
Nevertheless, methods that are significantly over- 
trained can improve when combined (Krogh, unpub- 
lished results). More rigorous studies for the optimal 
combination may provide a better picture. The tech- 
nical problem of utilizing many methods in a public 
server is that the field is advancing too fast: today's 
methods are more accurate than averages over yes- 
terday's methods (hence the JPred server now re- 
turns JPred2 results by default). 

WHAT DOES 76% ACCURACY MEAN, IN PRACTICE? 

Your protein may be predicted worse or better than 
average. A few problems in estimating expected 
prediction accuracy are described above. However, 
another problem is relevant for users of prediction 
methods: A sustained level of 76% accuracy does 
NOT mean that 76% of the residues in your protein 
of unknown structure U are correctly predicted. In 
contrast, prediction accuracy varies substantially 
between proteins (Fig. 2A). It seems that such vari- 
ations are intrinsic to any method predicting aspects 
of protein structure and function. What can you then 
expect as accuracy for your protein when using a 
state-of-the-art method? Given a divergent family 



(Table II), the answer is 66—86%. Do you learn from 
comparing different methods? 

Combining methods improves on average but you 
may also lose. Averaging over many methods 
helps, on average. However, most often some meth- 
ods are more accurate than the average (Fig. 2B). 
Furthermore, there are examples of proteins pre- 
dicted poorly by all methods (Fig. 2B), i.e., for which 
all methods agree by mistake (data not shown). 
Thus, trying to use many methods may not provide 
the answer to the question whether the prediction 
for your protein is more likely to be below or above 
average. Are there alternative ways to spot more 
reliably predicted regions? 

More . reliable predictions are more accurate. Re- 
liability indices as provided by most methods corre- 
late very well with prediction accuracy (Fig. 3). This 
implies that you can easily identify regions that are 
more likely to be predicted accurately than others. 
Furthermore, if your protein has many residues pre- 
dicted at low levels of reliability, you may correctly 
suspect that your protein is predicted at a level 
below average. Plotting coverage versus accuracy 
(Fig. 3) also illustrates how beneficial more diver- 
gent profiles are to make predictions more useful. 
For example, PSIPRED has more than half of all 
residues predicted at levels that would be reached 
on average when comparing two known structures 
(75) (Fig. 3, dotted line). 

ARE SECONDARY STRUCTURE PREDICTIONS 
USEFUL, IN PRACTICE? 

Regions likely to undergo structural change pre- 
dicted successfully. Young et aL (1) have unraveled 
an impressive correlation between local secondary 
structure predictions and global conditions. The au- 
thors monitor regions for which secondary structure 
prediction methods give equally strong preferences 
for two different states. Such regions are processed 
combining simple statistics and expert rules. The 
final method is tested on 16 proteins known to un- 
dergo structural rearrangements and on a number 
of other proteins. The authors report no false posi- 
tives and identify most known structural switches. 
Subsequently, the group applied the method to the 
myosin family, identifying putative switching re- 
gions that were not known before, but appeared to 
be reasonable candidates (76). I find this method 
most remarkable in two ways: (i) it is the most 
general method using predictions of protein struc- 
ture to predict some aspects of function and (ii) it 
illustrates that predictions may be useful even when 
structures are known (as in the case of the myosin 
family). 
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FIG. 2. Prediction accuracy varies substantially for different proteins. All results are based on 150 novel protein structures not used 
to develop any of the methods shown (58). The considerable difference in the three-state accuracy between different proteins is valid for 
all methods (A, percentage of all 150 proteins predicted at a given level of accuracy; one standard deviation is on the order of 10 percentage 
points). On average, different methods predict different proteins at higher levels (B, for each protein and each method, the difference 
between the per-protein average over all six methods is shown; negative values imply that the respective method is better than the 
average). Conclusions: (i) If you predict secondary structure for your protein with a method of 76% accuracy, the actual accuracy for that 
protein may be anywhere between 50 and 90%. (ii) As to be expected: most often some methods are more accurate than the average over 
many methods. 



Classifying proteins based on secondary structure 
predictions in the context of genome analysis. Pro- 
teins can be classified into families based on pre- 
dicted and observed secondary structure (27, 68). 
However, such procedures have been limited to a 
very coarse-grained grouping only exceptionally use- 
ful for inferring function (Table II). Nevertheless, in 
particular, predictions of membrane helices and 
coiled-coil regions are crucial for genome analysis. 
Recently, we came across an observation that may 
have important implications for structural genom- 
ics, in particular: More than one-fifth of all eukary- 
otic proteins appeared to have regions longer than 
60 residues apparently lacking any regular second- 
ary structure (77). Most of these regions were not of 
low complexity, i.e., not composition-biased. Sur- 
prisingly, these regions appeared evolutionarily as 
conserved as all other regions in the respective pro- 
teins. This application of secondary structure pre- 
diction may aid in classifying proteins, in separating 
domains, and possibly even in identifying particular 
functional motifs. 



Aspects of protein function predicted based on ex- 
pert analysis of secondary structure. The typical 
scenario in which secondary structure predictions 
facilitate learning about function is one in which 
experts combine their predictions and their intu- 
ition, most often to find similarities to proteins of 
known function but insignificant sequence similar- 
ity (39, 78-89). Usually, such applications are based 
on very specific details about predicted secondary 
structure (some examples are shown in Table II). 
Thus, these successful correlations of secondary 
structure and function appear difficult to incorpo- 
rate into automatic methods. 

Exploring secondary structure predictions to im- 
prove database searches. Initially, three groups in- 
dependently applied secondary structure predic- 
tions for fold recognition, i.e., the detection of 
structural similarities between proteins of unrelated 
sequences (90-92). A few years later, almost every 
other fold recognition/threading method has 
adopted this concept (93-102). Two recent methods 
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TABLE II 

Using Secondary Structure Predictions, in Practice 



How to obtain the best results? 



Identify membrane proteins? 



Classify through coiled-coil regions? 



Classify through secondary structure content? 



Identify domains or structural regions? 



Monitor influences of point mutations? 



Find binding sites or motifs? 



Infer functional/structural similarity? 



The major source of improvement is the divergence of the multiple sequence 
alignment used for prediction. Thus, if you have a small family, the 
expected prediction accuracy is lower. 

Particularly sensitive to divergence are the reliability indices; i.e., less 
divergence yields overestimated reliability indices. 

The most successful strategy to find the most reliably predicted regions may 
be to use the reliability index provided by a method rather than the 
agreement between different methods. 

If you know there are nonglobular or structural domains in your protein, 
chop it up before you build the alignment. 

If you can improve the alignment, try to do so before the prediction. 

Predicted membrane helices indicate that your protein is not globular. The 
accurate membrane predictions are usually more reliable than those for 
globular proteins. Thus, membrane helix predictions should be given 
preference. Globular methods often do not predict globular helices at 
positions of membrane helices; rather, oflen membrane helices are 
predicted as strand by mistakenly applied globular methods. In contrast, 
globular methods appear relatively more accurate for porin-like beta- 
strand membrane regions. 

Detection of membrane proteins has less than a 3% error rate for the best 
methods. Most helices are correctly predicted, yet the number of helices 
may nevertheless vary. Helix caps are clearly predicted inaccurately. Note 
that general methods predicting three-state secondary structure for 
globular proteins also predict caps less accurately. 

Predictions of long coiled-coil regions clearly indicate that your protein is 
locally nonglobular. Long coiled-coil proteins are likely to be structural 
proteins. Longer regions are predicted more accurately. 

Classifying proteins according to the secondary structure composition is 
helpful, but arbitrary. One hope may be to infer from the predicted 
secondary structure content that a particular protein is not typical. 
However, this attempt fails, since known protein structures vary 
significantly between 10 and 90% of regular secondary structure (helix, 
strand). Thus, secondary structure composition does not help to predict 
globularity. 

If you see two separate secondary structure patterns, you may suspect that 
the protein has two structural domains. An extreme example is an N- 
terminal all-alpha region and a C-terminal all-beta region. 

If you have to cut your protein, stay more than two residues away from 
predicted helices and strands. 

Secondary structure prediction methods are — on average — as accurate in 
predicting the overall content of secondary structure as are careful CD 
and FTIR methods. However, such methods allow you to monitor in detail 
structural responses to mutations. Such changes are less likely to be 
reflected as accurately by prediction methods. 

Most often, binding sites lie in nonregular secondary structure elements. 
For example, we have not predicted regular secondary structure for any 
of the known nuclear localization signals (128). 

Secondary structure predictions do not suffice to identify binding motifs, 
such as the zinc-linger II motif. However, the combination of sequence 
motif and predicted secondary structure may be very helpful. 

If you know the function/structure of protein A and want to infer whether B 
shares this function/structure, a similarity in the local secondary 
structure may help you substantially. 



extended the concept by not only refining the data- 
base search, but by actually refining the quality of 
the alignment through an iterative procedure (50, 
103). A related strategy has been implored by Ng 
and the Henikoffs to improve predictions and align- 
ments for membrane proteins (104). 



From ID predictions to 2D and 3D structure. Are 
secondary structure predictions accurate enough to 
help predict higher order aspects of protein struc- 
ture automatically? For 2D (interresidue contacts) 
predictions, Baldi ct al. (105) have recently im- 
proved the level of accuracy in predicting /3-strand 
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FIG. 3. Prediction accuracy correlates with reliability. The conclusion from Fig. 2A is that you have a poor idea of how well a method 
performs when applied to your protein of unknown structure. Fortunately, there is a way out of this dilemma: Most methods now provide 
an index measuring the reliability of the prediction for each residue. Shown is the accuracy versus the cumulative percentages of residues 
predicted at a given level of reliability (coverage vs accuracy). For example, PSIPRED and PROF reach a level above 88% for about 60% 
of all residues (dashed line). This particular line is chosen since secondary structure assignments by DSSP agree to about 88% for proteins 
of similar structure. Although JPred2 is only marginally less accurate than PSIPRED and PROF (Table I), it reaches this level of accuracy 
for less than half of all residues. Conclusions: (i) Reliability indices are extremely valuable to spot regions of more-likely-to-be-correct 
predictions, (ii) These indices also address the problem of variation: if many residues are predicted with high reliability, your protein is 
more likely to be predicted more accurately than average (Fig. 2A). 



pairings over earlier work (106) by using another 
elaborate neural network system. For 3D predic- 
tions, the following list of five groups exemplifies 
that secondary structure predictions are now a pop- 
ular first step toward predicting 3D structure, (i) 
Ortiz et al. (107) successfully use secondary struc- 
ture predictions as one component of their 3D struc- 
ture prediction method, (ii) Eyrich et al. (108, 109) 
minimize the energy of arranging predicted rigid 
secondary structure segments, (hi) Lomize et al. 
(110) also start from secondary structure segments, 
(iv) Chen et al. (Ill) suggest using secondary struc- 
ture predictions to reduce the complexity of molecu- 
lar dynamics simulations, (v) Levitt and co-workers 
(see 112, 113) combine secondary structure-based 
simplified presentations with a particular lattice 
simulation attempting to enumerate all possible 
folds. 

AND WHAT IS THE LIMIT OF PREDICTION 
ACCURACY? 

88% is a limit, but shall we ever reach close to 
there? Protein secondary structure formation is in- 
fluenced by long-range interactions (45, 46, 114) and 
by the environment (1, 115). Consequently, 



stretches of up to 11 adjacent residues (dubbed cha- 
meleon after (114)) can be found in different second- 
ary structure states (116-118). Implicitly, such non- 
local effects are contained in the exchange patterns 
of protein families. This is reflected by the fact that 
strand is predicted almost as accurately as helix 
(Table I), although sheets are stabilized by more 
nonlocal interactions than helices. Local profiles can 
even suffice to identify structural switches (1, 76). 
Surprisingly, we can find some traces of folding 
events in secondary structure predictions (119). 
Even more amazing is a study suggesting that align- 
ment-based methods achieve levels of accuracy for 
chameleon regions similar to those for all other re- 
gions (118). Secondary structure assignments may 
vary for two versions of the same structure. One 
reason is that protein structures are not rocks but 
dynamic objects with some regions being more mo- 
bile than others. Another reason is that any assign- 
ment method must choose particular thresholds 
(e.g., DSSP chooses a cut-off in the Coulomb energy 
of a hydrogen bond). Consequently, assignments dif- 
fer by about 5-15 percentage points between differ- 
ent X-ray versions or different NMR models for the 
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same protein (Andersen and Rost, unpublished re- 
sults), and by about 12 percentage points between 
structural homologues (75). The latter number pro- 
vides the upper limit for secondary structure predic- 
tion of error-free comparative modeling. I doubt that 
ab initio predictions of secondary structure will ever 
become more accurate than that. Hence, I believe a 
value of around 88% constitutes an operational up- 
per limit for prediction accuracy. After the advances 
over the past 2 years we reached greater than 76% 
accuracy. Thus, we need to achieve another 12 per- 
centage points (or even less). What is the major 
obstacle to reaching another 6 percentage points 
higher? The size of the experimental database as 
suggested (117)? I doubt this, since PHDpsi trained 
on only 200 proteins using PSI-BLAST input is al- 
most as accurate as PSIPRED trained on 2000 pro- 
teins (Table I). Will the current explosion of se- 
quences boost accuracy? In fact, current databases 
have less than 10 homologues for more than one 
third of the 150 proteins tested (Table I) and more 
than 100 for only 20% of the proteins. Although 
based on too a small set to draw conclusions, for 
these 20% highly populated families the accuracy of 
PROF was 4 percentage points above average (data 
not shown). Thus, larger databases may get us 6 
percentage points higher, and it may not. The an- 
swer remains nebulous. 

DISCUSSION 

Methods improved significantly over the past 2 
years. Growing databases and improved search 
techniques (Fig. 1) — predominantly through the it- 
erated PSI-BLAST tool — yielded a substantial im- 
provement in secondary structure prediction accu- 
racy over the past 2 years. State-of-the-art methods 
now reach sustained levels of 76% prediction accu- 
racy (Table I). Even more impressively, about 60% of 
all residues are predicted at levels reaching the level 
of agreement between X-ray and NMR structures 
(Fig. 3). However, novel ideas have also been shown 
to improve prediction accuracy. A standard way to 
increase the confidence in a particular prediction is 
to look at the results from many different prediction 
methods. This strategy is frequently successful and 
has been brought to perfection over recent years. 
However, often the best method is better than the 
average over many methods (Fig. 2B). While struc- 
ture prediction is coming of age, developers and us- 
ers slowly learn to reduce overestimations. How- 
ever, the correlations between proteins at times of 
database explosions are becoming more difficult to 
control. It seems that only continuous, automatic 
evaluation servers will be able to handle this chal- 
lenge in the future (58, 60). 



Secondary structure predictions are at the base of 
structure-based sequence analysis. Almost a de- 
cade after the original breakthrough, prediction 
methods are now increasingly explored by wet-lab 
biologists to analyze their protein of interest. Sec- 
ondary structure predictions are used automatically 
by methods aiming at higher dimensional aspects of 
protein structure and at improving database 
searches and alignment accuracy. One method has 
successfully related secondary structure predictions 
automatically to functional aspects (1, 76). However, 
secondary structure-based identifications of binding 
sites or other functional aspects are still restricted to 
single-case expert analyses. 

And now we run human? The field has advanced 
considerably over the past 2 years, and more im- 
provement appears to lie ahead. Prediction methods 
are fast enough to analyze entire genomes, and for 
particular examples the resulting classifications are 
relevant to structural and functional genomics (28, 
68). Nevertheless, to play the devil's advocate: The 
field is not up to the challenge of the human se- 
quences to be dubbed into the database very soon. 
We are missing a variety of approaches relating 
secondary structure predictions explicitly to func- 
tion, such as given by ASP (1). Obviously, this re- 
mark may apply to bioinformatics, in general: The 
year 2001 will commence with the publication of the 
entire human genome; we must rush to get ready for 
the data flood. 
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